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ABSTRACT 

Three procedures for evaluating the sampling 
specificity of results are reviewed. These procedures are Tukey's 
jacknife technique, Efron's bootstrap technique, and cross-validation 
methods. The jacknife technique uses different subsamples derived 
from the original total data set to provide empirical estimates of 
the generalizability of effect sizes. The bootstrap technique 
estimates the statistical accuracy of effect size estimates and 
creates a megafile by copying the original data sample over and over 
again many times. The researcher then randomly selects a given number 
of bootstrap samples of size "n" from the megafile. The effect size 
is computed for each bootstrap sample; these correlation coefficients 
are treated as a distribution from which statistical estimates of 
result stability are derived. Cross-validation methods involve the 
arbitrary splitting of a sample. Prediction equations developed for 
each group are crossed so that each group will use the other group's 
prediction equations. A small data set is used to illustrate in more 
detail hew the cross-validation procedure is performed and 
interpreted. Two data tables and sample Statistical Analysis System 
commands are provided. (TJH) 
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ABSTRACT 

Three procedures for evaluating the sampling 
specificity of results are reviewed: a) Tukey's 
jacknife technique, b) Kfron's bootstrap tetJinique, and 
c) cross-validation methods. Each procedure is briefly 
explained. A small data set is employed to illustrate 
in more detail how the c ros s- validation procedure is 
performed and interpreted. 



Carver (1978) notes that "replication is the 
cornerstone of science*' (p. 392) (Bauernf eind , 1968; 
Smith, 1970). The replication of research findings 
informs the researcher of the g en e r a 1 i z a b i 1 i t y of 
obtained findings (Smith, 1970). Researchers need to 
know that observed effects are "true effects". Through 
investigation researchers endeavor to determine: a) the 
validity of sample-based results with respect to the 
broader population of interest; b) the stability of 
calculated findings derived from sample estimates; and 
c) the nature of the relationship between independent 
variables and observed phenomena. 

Crask and Perreault (1977) point out that failure 
to determine the validity of sample-based results and 
the stability of calculated findings may lead to the 
reporting of inaccurate results based upon sample 
specific findings. The researcher's desire to 
demonstrate that results are replicable, i.e., that 
results are not based upon chance, has lead to a 
reliance upon statistical significance testing. Carver 
(1978) reports that interest in evalual:ing repl icabili ty 
is one of the two most influential reasons why 
statistical significance testing flourishes. Yet, the 
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interpretation of statistical significance as 
reflecting the probability that results will be 
replicated has no basis whatsoever in the logic of 
significance testing (Carver, 1978). 

According to Carver, statistical significance 
testings limits the researcher to decisions of rejecting 
or failing to reject the null hypothesis given a 
probability of obtaining sample results under an 
assumption that the null hypothesis is exactly true. 
Therefore, the interpretation t!iat the probability value 
reflects the replicability or reliability of results is 
completely erroneous. 

Thompson (in press) has demonstrated through the 
use of example data how reliance upon statistical 
significance testing may mislead the researcher in the 
interpretation of results. Thompson employed varying 
sample sizes in illustrating -.hat the value of the 
effect size remained unchanged even when the sample size 
increased; however, the interpretation of statistical 
significance did change as a function of the increase in 
sample size. The researcher basing decisions on 
statistical significance may ignore large effect sizes 
that are not significant while over interpreting effect 
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sizes that are small but statistically significant. 
Carver (1978) points out that effect sizes and 
significance testing do not inform the researcher of the 
likelihood of the replication of results. From a 
scientific point of view, it is more desirable to have a 
moderat^e effect size which is very stable or replicable 
rather than a large effect size which may be 
statistically significant but not stable or replicable. 

There are three procedures for evaluating the 
sampling specificity of results: a) Tukey's jacknife 
technique, b) Efron's bootstrap technique, and c) cross- 
validation methods. This paper will describe how each 
procedure is performea and how results of each procedure 
are interpreted. A small data set developed by Thompson 
(in press) is employed to illustrate in more detail how 
the cross-validation procedure is performed and 
interpreted . 
Jacknife Technique 

The jacknife technique employes different 
subsamples derived from the original total data set to 
provide empirical estimates of the generalizability of 
effect sizes (Ayabe, 1985; Crask & Perreault, 1977). 
The stability of the jacknife estimate across subsamples 
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of the total data sample is interpreted by the 
researcher as an indicator of the reliability and 
replicabili ty of the effect size obtained from the total 
sample • 

The jacknife procedure is carried out by first 
computiiig the effect size for the entire sample, and 
then recomputing the statistic of interest n times, each 
time dropping a different observation from the sample 
(Ayabe, 1985) • By repetitively dropping one observation 
at a time, the researcher is able to determine 
f luc tuta tions in sampling error which may be attributed 
to the uniqueness of the single observation dropped or 
to the combined characteristics of the subsample. The 
standard deviations of estimated effect sizes derived 
with different subsamples indicate sampling error and 
enabl : the researcher to determine the stability of 
jacknife estimates, Crask and Perreault (1977) may be 
referred to for a readable presentation of the jacknife 
technique. 
Bootstrap Technique 

Like the jacknife technique, the bootstrap gives an 
estimate of the statistical accuracy of effect size 
estimates (Diaconis & Efron, 1983), However, in the 



bootstrap technique, a megafile is created by copying 
the original data sample over and over again an 
extraordinary number of times. The researcher then 
randomly selects a given number of bootstrap samples of 
size n from the megafile. The effect size is computed 
for each bootstrap sample. These bootstrap correlation 
coefficients can be treated as a distribution from which 
statistical estimates of result stability may be derived 
(Diaconis & Efron, 1983). 

The bootstrap technique is especially useful to 
the researcher when a large or moderate effect size is 
obtained, but a statistically nonsignificant finding has 
occurred due to a small sample size. In this case, the 
researcher can determine the replicabilit y of results by 
performing the bootstrap. An example illustrating the 
application of the bootstrap is provided by Diaconis and 
Efron (1983). 
Cross-validation 

Cross-validation methods involve the arbitrary 
splitting of a sample. The sample may be split in half, 
or the sample may be split in other proportions such as 
sixty percent and forty percent. In cross-validation, 
the prediction equations developed for each of the split 
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groups are "crossed" so that each group will use the 
other group's prediction equations (Ayabe, 1985). The 
researcher wishes to determine two things: a) which 
beta weights (or related weights) will best predict the 
dependent variable from the predictor variables, and b) 
how stajble in prediction is the effect size estimate. 
To make this discussion concrete, the use of cross- 
validation in regression research will be discussed in 
more detail. 

To perform the cross-validation procedure, the 
researcher must carry-out two computer runs. The first 
run is conducted to derive the means and standard 
deviations for the total group and the two subgroups, 
which may also be referred to as invariance groups 
(Thompson, in press). The CORRELATION and MULTIPLE 
REGRESSION procedure are also run for the total group 
and for both invariance groups. The multiple 
correlation coefficient for the total group serves as 
the basis for ultimate interpretation if the results 
prove to he stable or invariant. 

In the second computer run, the researcher creates 

* 

new variables (i.e., Z scores and YHAT predicted 
scores). The Z scores for invariance groups use the 



means and standard deviations for each of their 
respective groups. Two sets of YHAT values are created 
a) using their invariance group's data and beta weights 
and b) using their group*s data and the other group's 
beta weights, Invariance results are obtained by 
running^ the CORRELATION procedure for all the YHAT 
values and dependent variable. Appendix A presents the 
SAS comraand.'s used to conduct the empirical analysis of 
Thompson's (in press) data set. 

The researcher interprets the cross-validation 
results through a comparison of the multiple correlation 
coefficients, "shrunken" multiple correlation 
coefficients, and cross-validation or invariance 
correlation coefficients. In the cross-validation of 
results, the multiple correlation coefficients are 
obta ined from a produc t— moraen t correlation between each 
subgroup's predicted scores derived using its own 
weights witli the criterion group's scores (Krus & 
Fuller, 1982). The "shrunken R" is obtained through the 
product-moment correlation between predicted scores of 
the subgroups derived using the other group's weights 
and the criterion scores. The cross-validation 
coefficient represents the product-moment correlations 
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between the predicted scores of the subgroups when each 
subgroup's own weights are applied as against when the 
other subgroup's prediction equation is applied. 

While both the multiple correlation coefficient and 
the "shrunken R_" represent correlations between the 
subgroup's predicted scores and criterion scores, the 
cross-validation or invariance coefficient represents 
the correlation between two sets of predicted scores. 
The researcher always hopes that the cross-validation or 
invariance coefficient will equal one. 

The researcher looks for stability of the multiple 
R across subsamples and for the effects of "crossing" 
the regression equations for subgroups (i.e., "shrunken 
R^" ) . If multiple R coefficients are comparable, then 
the researcher has sora^ evidence for the replicabili ty 
of results and for the representativeness of the sample 
(Krus & Fuller, 1982). However, the in variance 
coefficients can be directly interpreted, always against 
the standard of how close to one they are* 

Table 1 presents a small data set for a multiple 
regression problem developed by Thompson ^in press) . 
Two variabl^i, "P'* and "R" , are used to predict "DV", 
the dependent variable. The first three subjects were 
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randomly assigned to the first invariance subgroup 
("INV" = "1"). The last four subjects were assigned to 
the second invariance subgroup ("INV" = "2"). 



INSERT TABLE 1 ABOUT HERE. 

The invariance results produced by the CORRELATIONS 
procedure are presented iu Table 2. The multiple 
correlation coefficients for the invariance groups are 
high, positive, and comparable; however, the "shrunken 
R_" for invariance groups have a negative value which 
indicates that the regression equations are not 
generalizable across subgroups and therefore will not be 
generalizable uo broader populations of interest. The 
invariance coefficients are also negative values which 
indicate a high degree of sampling error between 
subgroups. These data demonstrate that results are not 
replicable across subsaraples. 



INSERT TABLE 2 ABOUT HERE, 



Thompson (in press) emphasizes the importance of 
empirically investigating result replicability rather 
than subjectively comparing the stability of multiple R 
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across subgroups. The cross-validation procedure will 
benefit the researcher wishing to demonstrate the 
replicabili ty of results and the generalizabili ty of 
sample characteristics. 
Summary 

Researchers leed to know that observed effects are 
"true effects". Statistical significance testing does 
not inform the researcher regarding the important issue 
of v.hether results will generalize. The interpretation 
of statistical probability values as an indication of 
the likelihood that results will be replicated exceeds 
the logic of statistical reasoning. Further, reliance 
upon statistical significance testing may mislead the 
researcher in the interpretation of results. The 
researcher basing decisions on statistical significance 
may ignore large effect sizes that are not significant 
while over interpreting effect sizes that are small but 
statistically significant. 

Three procedures for evaluating the sampling 
specificity of results are reviewed: a) Tukey's 
jacknife technique, b) Efron's bootstrap technique, and 
c) cross-validation methods. Each procedure is briefly 
explained, A small data set is employed to illustrate 
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in more detail how the cross-validation procedure is 
performed and interpreted. 
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Table 1 

Observed and Latent Variables for Thompson's Data Set 



INV 


P 


R 


DV 


ZPl 


1 


1 


3 


90 


-1 


1 


2 


6 


49 


0 


1 


3 


1 


93 


1 


2 


4 


8 


20 


• 


2 


5 


4 


3 


• 


2 


6 


0 


39 


• 


2 


7 


9 


63 


• 



ZRl YHATll 
1323 
0595 -1 
9269 



YHAT12 



ZP2 ZR2 YHAT21 YHAT22 



5151 


-.87339 




• 






• 




1525 


.30350 




• 






• 




6370 


.57000 


















■1 


!l62 


.669 




!296 


-.779 








.387 


-.304 




.474 


-.411 








.387 


-1.276 


1 


.245 


-.042 






1 


.162 


.912 


-1 


.423 


1.232 



YHAT22 



Table 2 
Invariance Statistics 

DV YHATll YHAT12 YHAT21 

a 



YHATll 1.0000 
(n=3) 

b 

YHAT12 -.2842 -.2843 

(n=3) (n=3) 
b 

YHAT21 -.5182 
(n=4) 



.8747 
(n=4) 



-.5924 
(n = 4) 



The multiple correlation coefficient R for the invariance group 
The "shrunken R_ " for the invariance group. 



The invariance coefficient for the invariance group. 
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APPENDIX A: Example SAS Commands for Table 1 Data 

DATA INVAR; 

INFILE INV; 

INPUT INV 1-2 P 4-5 K 7-8 DV 10-11; 

if inv=:l then do; 

zpl=(p-2.0)/1.0; 
zrl=(r-.3.333)/2.517; 

yhatll=(-.371189*2pl)+(-1.087694*zrl); 
yhatl2=(.83549*zpl)+(.286434*zrl); 

End; 

Else Do; 

zp2=(p-5.5)/1.291; 
zr2=.(r-5.25)/4.113; 

yhat21=(-..371189*zp2)+(-1.087694*zr2); 
yhat22=( .83549*zp2)+( .286434*2r2) ; 

End; 

PROC PRINT; 
PROC MEANS; 

VAR P R DV; 
PROC CORR; 

VAR P R DV; 

TITLEl ^DESCRIPTIVE STATISTICS FOR ALL DATA'; 
PROC REG; 

MODEL DV=P R/STB; 

TITLEl 'REGRESSION JSING ALL DATA'; 



DATA TEMPI; 

SET INVAR; 

IF INVAR=1; 
PROC CORR; 

VAR P R DV; 

TITLEl 'DESCRIPTIVE STATISTICS FOR SUBGROUP ONE 
PROC REG; 

MODEL DV=P R/STB; 

TITLEl 'REGRESSION FOR SUBGROUP ONE'; 



DATA TEMP2; 

SET INVAR; 

IF INVAR=2; 
PROC CORR; 

VAR P R DV; 

TITLEl 'DESCRIPTIVE STATISTICS FOR SUBGROUP TWO 
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PROC REG; 

MODEL DV=P R/STB; 

TITLEl » REGRESSION FOR SUBGROUP TWO'; 



Data All; 

Set In^ar; 
Proc Corr; 

Var DV YHATll YHAT12 YHAT21 YHAT22; 

TITLKl •INVARIANCE RESULTS'; 



Note . The analysis requires two runs. The first run 
includes procedures typed in all capitol letters. The 
second run procedures are typed in bold type. The newly 
created variables for the second run are typed in lower 
case . 
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